This code can be found at https://github.com/libjohn/workshop_rfun_flipped/ggplot2_quick.Rmd

Load library packages

I only need ggplot2 but I like to load tidyverse because it includes 8 complimentary packages, including ggplot2.

Get more information from:

# library(ggplot2)
library(tidyverse)

The ggplot2 template is used to identify the dataframe, identify the x and y axis, and define visualized layers

ggplot(data = ---, mapping = aes(x = ---, y = ---)) +
  geom_----()

Note: ---- is meant to imply text (function names, dataframe names, variable names) you supply.

It is helpful to see the argument mapping, above. In practice, rather than typing the formal arguments, code is typically shorthanded to this:

dataframe %>% 
  ggplot(aes(xvar, yvar)) +
  geom_----() 

Goal

Visualize a scatter plot showing the relationship of mass to height for Star Wars characters in the dplyr::starwars dataframe, excluding the heaviest character. Indicate a linear regression line.

Import data

dplyr has an onboard dataset, starwars

data(starwars)
starwars
## # A tibble: 87 x 14
##    name  height  mass hair_color skin_color eye_color birth_year sex   gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
##  1 Luke~    172    77 blond      fair       blue            19   male  mascu~
##  2 C-3PO    167    75 <NA>       gold       yellow         112   none  mascu~
##  3 R2-D2     96    32 <NA>       white, bl~ red             33   none  mascu~
##  4 Dart~    202   136 none       white      yellow          41.9 male  mascu~
##  5 Leia~    150    49 brown      light      brown           19   fema~ femin~
##  6 Owen~    178   120 brown, gr~ light      blue            52   male  mascu~
##  7 Beru~    165    75 brown      light      blue            47   fema~ femin~
##  8 R5-D4     97    32 <NA>       white, red red             NA   none  mascu~
##  9 Bigg~    183    84 black      light      brown           24   male  mascu~
## 10 Obi-~    182    77 auburn, w~ fair       blue-gray       57   male  mascu~
## # ... with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
## #   films <list>, vehicles <list>, starships <list>

Steps to Visualization

Draw the base layer

This feels like, and looks like, you drew an empty box.

starwars %>% 
  ggplot() 

But wait, there’s more….

Map the aesthetics to variables in the dataframe

Still doesn’t look like much. You will initialize the plot scales and labels based on the values of the variables in the dataframe.

starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(height, mass))

In the above, I subset the data, removing any Star Wars characters weighing more than 500 Kg – dplyr::filter(). Then I initialized the base layer with the height as the x axis and mass as the y axis. ggplot drew the scales for me.

Visualize a layer

Since I have two numeric variables, height and mass, I’ll start with a scatter plot. Scatter plots are generated by the geom_point() function.

starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(height, mass)) +
  geom_point() 

Global v local arguments

  • Mapping aesthetics in the ggplot argument maps aesthetics as global arguments (above)
  • Arguments can also be set locally in the local layer function

aes() arguments mapped locally in geom_point()

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  geom_point(aes(height, mass)) 

Mapping v Setting

Many arguments can be mapped inside the aesthetic, aes(), to leverage variable values, OR set a visualized property outside the aes() function, but inside the geom_ function.

Aesthetic arguments include:

  • color
  • fill
  • size
  • linetype
  • opacity
  • shape
  • and more see documentation for each geom_

Mapping: color is mapped inside aes() function

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  # geom_point(mapping = aes(x = height, y = mass, color = gender))
  geom_point(aes(height, mass, color = gender))

Notice the legend was drawn automatically, above, by mapping an aesthetic

Setting: color set outside the aes() function

starwars %>% 
  filter(mass < 500) %>% 
  ggplot() +
  geom_point(aes(height, mass), color = "goldenrod")

Common geom_ functions

Type Geom
Bar graph: geom_bar() geom_col()
Histogram: geom_hist()
Scatter plot: geom_point() geom_jitter()
Line graph: geom_line()
Box plot: geom_boxplot()
Density: geom_density() geom_violin()
Heat map: geom_heatmap()
Mapping: geom_sf()
Regression line: geom_smooth()

A list of available geom_ functions, or layers, can be found in the help or on the website: https://ggplot2.tidyverse.org/reference/index.html#section-geoms

Boxplot

starwars %>% 
  mutate(species = fct_lump_min(species, 2)) %>% 
  ggplot(aes(species, height)) +
  geom_boxplot() 

Line graph

babynames::babynames %>% 
  filter(name == "Watts") %>% 
  ggplot(aes(year, n)) +
  # geom_point() +
  geom_line()

Multiple layers

Each layer can support local arguments and draw from the global settings. Below we use the geom_line() function, followed by the geom_point() function.

babynames %>%
  ggplot(aes(year, prop)) +
  geom_line(aes(color = sex)) +
  geom_point(alpha = 0.4, shape = "cross")

But there is more to that graph, here’s the full code for the above graph.

library(babynames)
library(ggplot)

babynames %>% 
  filter(name == "John" & sex == "M" | 
           name == "Elizabeth" & sex == "F") %>% 
  ggplot(aes(year, prop)) +
  geom_line(aes(color = sex)) +
  geom_point(alpha = 0.4, shape = "cross") +
  geom_text(data = . %>% filter(year == 1965), aes(label = name),
            nudge_y = .009) +
  labs(title = "Name Popularity") + 
  theme(legend.position = "none")

Goal

Recall the goal mentioned in the beginning. We want a scatter plot and a regression line. This can be accomplished by adding a layer in the form of another geom_ function: geom_smooth()

starwars %>% 
  filter(mass < 500) %>% 
  ggplot(aes(height, mass)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE)

Arrange order

Categorical values are most easily ordered with the forcats library. Part of the tidyverse, forcats will convert string data into factors, i.e. categorical data. This enables ordering.

msleep %>% 
  ggplot(aes(vore)) +
  geom_bar()

### forcats::fct_infreq()

Change the order of the bars by the frequency of observations.

msleep %>% 
  ggplot(aes(fct_infreq(vore))) +
  geom_bar() 

Notice below, we use the fill = argument to set the color of the bar. In the scatter plot, above, we used the color = argument. For many geoms you can use both color and fill. How do these arguments differ? Where can you look to find out more about fill and color?

starwars %>% 
  ggplot(aes(fct_rev(fct_infreq(eye_color)))) +
  geom_bar(fill = "grey70") +
  geom_bar(data = starwars %>% filter(eye_color == "orange"), fill = "darkorange") +
  coord_flip()

Facet wrap

Faceting is great way to make subplots of the same dataframe. See both facet_wrap() and facet_grid()

mpg %>% 
  ggplot(aes(displ, hwy)) +
  geom_point() +
  facet_wrap(~ class)

Scales

I’ll briefly introduce the use of scales to affect. In this case, scales are used to affect the color of the plot. Read more about scales.

Viridis scales apply color palettes to continuous, discrete, or binned data

msleep %>% 
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_viridis_d(na.value = "grey80")

The color brewer palette is similar but has a wider array of palettes to choose from.

msleep %>% 
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_brewer(type = "qual", na.value = "grey80") 

To find available colors: Google search “R color names”, or specific to ColorBrewer….

#display.brewer.pal(7,"Dark2")
RColorBrewer::display.brewer.all()

Sometimes a manual scale is preferred. I like to google-search: “R color names” for helpful documentation.

mycolors <- c("firebrick", "forestgreen", "navy", "darkorange", 
               "goldenrod", "sienna")

msleep %>% 
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_manual(values = mycolors, na.value = "grey80") 

Scales are used to manipulate the visual properties of the data. Beyond using scales to modify colors, another example is logarithmic scales to account for data skew. In this way you can clarify the data pattern. For example, using the ChickWeight dataset, we visualize the weights of the chicks over time.

data("ChickWeight")

ChickWeight %>% 
  ggplot(aes(Time, weight, color = Diet)) +
  geom_line(aes(group = Chick))

Using scale_y_log10 we can alter the scale to highlight a more understandable data pattern

chicken_plot <- ChickWeight %>% 
  ggplot(aes(Time, weight, color = Diet)) +
  geom_line(aes(group = Chick)) +
  scale_y_log10()
chicken_plot

Labels

The labs() function is specialized scales function, used to apply labels. For example, use the labs() function to add a title, subtitle, legend title, modify axis labels, and set a caption. See more on scales.

plot_sleep <- msleep %>% 
  mutate(vore = case_when(
    vore == "herbi" ~ "Herbivore",
    vore == "omni"  ~ "Omnivore",
    vore == "carni" ~ "Carnivore",
    vore == "insecti" ~ "Insectivore"
  ))  %>%
  ggplot(aes(fct_infreq(vore), sleep_total)) +
  geom_col(aes(fill = conservation)) +
  scale_fill_brewer(type = "qual", na.value = "grey80") +
  labs(title = "Animal sleep times", 
       subtitle = "A practice dataset",
       fill = "Conservation\nType",
       x = "",
       y = "Sleep time in hours",
       caption = "Source: ggplot::msleep")

plot_sleep

Themes

Themes are used to manipulate the stylistic characteristics of the non-data components of your plot, such as font faces, text sizes, and grid lines. ProTip: quickly manipulate a single plot with preset themes such as theme_dark, or use a specialized theme extension such as theme_ipsum from the hrbrthemes package.

See more on themes

plot_sleep +
  theme_dark()

plot_sleep +
  theme_classic()

hbrthemes

https://cinc.rud.is/web/packages/hrbrthemes/

plot_sleep +
  hrbrthemes::theme_ipsum(grid = "Y") +
  hrbrthemes::scale_fill_ipsum(na.value = "grey80",
                               labels = c("Critical", "Domesticated", 
                                          "Endangered", "Least Concern", 
                                          "Threatened", "Vulnerable")) +
  theme(plot.title.position = "plot")

Combine plots

The patchwork package makes it “ridiculously simple to combine separate ggplot objects into the same graphic.” See more about patchwork

# install.packages("devtools")
# devtools::install_github("thomasp85/patchwork")
# https://patchwork.data-imaginist.com/
library(patchwork)

(plot_sleep / chicken_plot)

Interactive plots

Use the ggplotly function to transform your static plot into an interactive plot that can be used in dashboards and web presentations.

See more at the Plotly ggplot2 Library page, and the Interactive web-based data visualization with R, plotly, and shiny book.

library(plotly)
ggplotly(plot_sleep)

Annimate plots

Use the gganimate package to bring your plot to life through the wonders of animation. Learn more at the resource page for gganimate

For Example:

Reinforce your learning

On your own…

Interactive Exercises from RStudio Primers – Visualization

Angela Zoss code exercises

Resources

Designing effective visualizations by Dr. Mine Çetinkaya-Rundel - Introduction to Data Science https://introds.org

Data Visualization: A Practical Introduction. Kieran Healy